The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems

نویسنده

Adam Sykulski

چکیده

Sequential decision making problems often require an agent to act in an environment where data is noisy or not fully observed. The agent will have to learn how different actions relate to different rewards, and must therefore balance the need to explore and exploit in an effective strategy. In this report, sequential decision making problems are considered through extensions of the multi-armed bandit framework. Firstly, the bandit problem is extended to a Multi-Agent System (MAS), where agents control individual arms but can communicate potentially useful information with each other. This framework allows for a better understanding of the exploration-exploitation tradeoff in scenarios where there are multiple agents interacting in a noisy environment. To this end, we present a novel strategy for action and communication decisions and we demonstrate the benefits of such a strategy empirically. This motivates a theoretical analysis of one-armed bandit problems, to develop ideas of how different strategies are optimally tuned. Specifically, the expected rewards of -greedy strategies are derived, as well as proofs governing their optimal tuning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

Growing availability of data has enabled practitioners to tailor decisions at the individuallevel. This involves learning a model of decision outcomes conditional on individual-specific covariates or features. Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that ba...

متن کامل

University of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY

This dissertation research addresses the challenge of scaling up algorithms for sequential decision making under uncertainty. In my dissertation, I developed new approximation strategies for planning and learning in the presence of uncertainty while maintaining useful theoretical properties that allow larger problems to be tackled than is practical with exact methods. In particular, my research...

متن کامل

Psychological Models of Human and Optimal Performance in Bandit Problems Action Editor: Andrew Howes

In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a fixed but unknown rate of reward, to maximize their total number of rewards over a sequence of trials. Performing well in these problems requires balancing the need to search for highly-rewarding alternatives, with the need to capitalize on those alternatives already known to be reasonably good. ...

متن کامل

Psychological models of human and optimal performance in bandit problems

متن کامل

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a Karmed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems

نویسنده

چکیده

منابع مشابه

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

University of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY

Psychological Models of Human and Optimal Performance in Bandit Problems Action Editor: Andrew Howes

Psychological models of human and optimal performance in bandit problems

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

عنوان ژورنال:

اشتراک گذاری